Module 11 Animated and Interactive Graphics

Interactive and animated graphics are one of the major advantages of using the Rmarkdown ecosystem - because you can easily create web pages in markdown (without the pain of HTML), you aren’t limited by paper any more. We’ll cover two different technologies that allow you to create different types of interactive charts, graphs, and interfaces.

It is helpful to think about interactivity in a couple of different ways:

  1. What does it require? Do you need to be doing statistical calculations in the background, or can you precompute all of the data ahead of time?

  2. What type of activity or interactivity do you need?

    • Zoom in/out?
    • Provide additional information in response to user actions (mouseover, click)
    • Provide information over time (animation)
    • Keep track of a data point over multiple plots? (linked plots)
    • Keep track of one or more data points and change their appearance based on user interaction (brushing)
    • Allow the user to change the underlying statistical model or data?

(This is not a full list of all of the types of interactivity, just a few of the more common options)

In this section, we’ll cover two ways to easily create interactive graphics or applets in R. There are, of course, many others – many javascript libraries have R extensions of one form or another.

Animated and Interactive Graphics: Module Objectives

  • Create interactive charts with appropriate tools
  • Use Shiny to create interactive web applets

11.1 Plotly

Plotly is a graphing library that uses javascript to add interactivity to graphics. There are several different ways to create plotly graphs in R, but by far the easiest is ggplotly, which converts a ggplot to a plotly plot automatically (so you don’t have to specify most of the details).

11.1.1 ggplotly: ggplot2 to plotly conversions

Set up the data

if (!"plotly" %in% installed.packages()) install.packages("plotly")

if (!"tidytuesdayR" %in% installed.packages()) {
  devtools::install_github("thebioengineer/tidytuesdayR")
}

library(dplyr)
library(tidyr)
library(ggplot2)
library(tibble)
library(lubridate) # dates and times
library(tidytuesdayR) # get interesting data
library(plotly)
library(stringr)

# Load the data from TidyTuesday on May 12
full_data <- tt_load('2020-05-12')

volcano <- full_data$volcano
eruptions <- full_data$eruptions
events <- full_data$events
sulfur <- full_data$sulfur
trees <- full_data$tree_rings

Let’s try out plotly while doing a bit of exploratory data analysis on this dataset.

Cleaning up volcano

volcano <- volcano %>%
  filter(tectonic_settings != "Unknown") %>%
  separate(tectonic_settings, into = c("zone", "crust"), sep = "/", remove = F) %>%
  # Remove anything past the first punctuation character - that will catch (xx) and ?
  mutate(volcano_type = str_remove(primary_volcano_type, "[[:punct:]].*$"))

Let’s start by seeing whether the elevation of a volcano changes based on the type of zone it’s on - we might expect that Rift zone volcanos (where plates are pulling away from each other) might not be as high.

p <- volcano %>%
  ggplot(aes(x = zone, y = elevation)) + 
  geom_boxplot() + 
  coord_flip()
ggplotly(p)
But it doesn’t really look like there’s much difference.

Does volcano type makes a difference?

p <- volcano %>%
  ggplot(aes(x = elevation, color = volcano_type)) + 
  geom_density() + 
  # Rug plots show each observation as a tick just below the x axis
  geom_rug()
ggplotly(p)

Here, the interactivity actually helps a bit: we don’t need to use the legend to see what each curve corresponds to. We can see that submarine volcanoes are typically much lower in elevation (ok, duh), but also that subglacial volcanoes are found in a very limited range. If we double-click on a legend entry, we can get rid of all other curves and examine each curve one by one.

I added the rug layer after the initial bout because I was curious how much data each of these curves were based on. If we want only curves with n > 10 observations, we can do that:

p <- volcano %>%
  group_by(volcano_type) %>% mutate(n = n()) %>%
  filter(n > 10) %>%
  ggplot(aes(x = elevation, color = volcano_type)) + 
  geom_density() + 
  # Rug plots show each observation as a tick just below the x axis
  geom_rug(aes(text = paste0(volcano_name, ", ", country)))
## Warning: Ignoring unknown aesthetics: text
ggplotly(p)

If we want to specify additional information that should show up in the tooltip, we can do that as well by adding the text aesthetic even though geom_rug doesn’t take a text aesthetic. You may notice that ggplot2 complains about the unknown aesthetic I’ve added to geom_rug: That allows us to mouse over each data point in the rug plot and see what volcano it belongs to. So we can tell from the rug plot that the tallest volcano is Ojas de Salvado, in Chile/Argentina (I believe that translates to Eyes of Salvation?).

At any rate, there isn’t nearly as much variation as I was expecting in the elevation of different types of volcanoes.

ggplotly makes it very easy to generate plots that have a ggplot2 equivalent; you can customize these plots further using plotly functions that we’ll see in the next section. But first, try the interface out on your own.

Try it out

Conduct an exploratory data analysis of the eruptions dataset. What do you find?

My solution

head(eruptions)
## # A tibble: 6 x 15
##   volcano_number volcano_name eruption_number eruption_catego… area_of_activity
##            <dbl> <chr>                  <dbl> <chr>            <chr>           
## 1         266030 Soputan                22354 Confirmed Erupt… <NA>            
## 2         343100 San Miguel             22355 Confirmed Erupt… <NA>            
## 3         233020 Fournaise, …           22343 Confirmed Erupt… <NA>            
## 4         345020 Rincon de l…           22346 Confirmed Erupt… <NA>            
## 5         353010 Fernandina             22347 Confirmed Erupt… <NA>            
## 6         273070 Taal                   22344 Confirmed Erupt… <NA>            
## # … with 10 more variables: vei <dbl>, start_year <dbl>, start_month <dbl>,
## #   start_day <dbl>, evidence_method_dating <chr>, end_year <dbl>,
## #   end_month <dbl>, end_day <dbl>, latitude <dbl>, longitude <dbl>
summary(eruptions %>% mutate(eruption_category = factor(eruption_category)))
##  volcano_number   volcano_name       eruption_number
##  Min.   :210010   Length:11178       Min.   :10001  
##  1st Qu.:263310   Class :character   1st Qu.:12817  
##  Median :290050   Mode  :character   Median :15650  
##  Mean   :300284                      Mean   :15667  
##  3rd Qu.:343030                      3rd Qu.:18464  
##  Max.   :600000                      Max.   :22355  
##                                                     
##             eruption_category area_of_activity        vei       
##  Confirmed Eruption  :9900    Length:11178       Min.   :0.000  
##  Discredited Eruption: 166    Class :character   1st Qu.:1.000  
##  Uncertain Eruption  :1112    Mode  :character   Median :2.000  
##                                                  Mean   :1.948  
##                                                  3rd Qu.:2.000  
##                                                  Max.   :7.000  
##                                                  NA's   :2906   
##    start_year        start_month       start_day      evidence_method_dating
##  Min.   :-11345.0   Min.   : 0.000   Min.   : 0.000   Length:11178          
##  1st Qu.:   680.0   1st Qu.: 0.000   1st Qu.: 0.000   Class :character      
##  Median :  1847.0   Median : 1.000   Median : 0.000   Mode  :character      
##  Mean   :   622.8   Mean   : 3.451   Mean   : 7.015                         
##  3rd Qu.:  1950.0   3rd Qu.: 7.000   3rd Qu.:15.000                         
##  Max.   :  2020.0   Max.   :12.000   Max.   :31.000                         
##  NA's   :1          NA's   :193      NA's   :196                            
##     end_year      end_month         end_day         latitude      
##  Min.   :-475   Min.   : 0.000   Min.   : 0.00   Min.   :-77.530  
##  1st Qu.:1895   1st Qu.: 3.000   1st Qu.: 4.00   1st Qu.: -6.102  
##  Median :1957   Median : 6.000   Median :15.00   Median : 17.600  
##  Mean   :1917   Mean   : 6.221   Mean   :13.32   Mean   : 16.866  
##  3rd Qu.:1992   3rd Qu.: 9.000   3rd Qu.:21.00   3rd Qu.: 40.821  
##  Max.   :2020   Max.   :12.000   Max.   :31.00   Max.   : 85.608  
##  NA's   :6846   NA's   :6849     NA's   :6852                     
##    longitude      
##  Min.   :-179.97  
##  1st Qu.: -77.66  
##  Median :  55.71  
##  Mean   :  31.57  
##  3rd Qu.: 139.39  
##  Max.   : 179.58  
## 
# Historical (very historical) dates are a bit of a pain to work with, so I 
# wrote a helper function which takes year, month, and day arguments and formats
# them properly

fix_date <- function(yyyy, mm, dd) {
  # First, negative years (BCE) are a bit of a problem.
  neg <- yyyy < 0
  subtract_years <- pmax(-yyyy, 0) # Years to subtract off later
  # for now, set to 0
  year_fixed <- pmax(yyyy, 0) # this will set anything negative to 0
  
  # sometimes the day or month isn't known, so just use 1 for both. 
  # recorded value may be NA or 0.
  day_fixed <- ifelse(is.na(dd), 1, pmax(dd, 1))
  month_fixed <- ifelse(is.na(mm), 1, pmax(mm, 1))
  
  # Need to format things precisely, so use sprintf
  # %0xd ensures that you have at least x digits, padding the left side with 0s
  # lubridate doesn't love having 3-digit years. 
  date_str <- sprintf("%04d/%02d/%02d", year_fixed, month_fixed, day_fixed)
  # Then we can convert the dates and subtract off the years for pre-CE dates
  date <- ymd(date_str) - years(subtract_years)
}

erupt <- eruptions %>% 
  # Don't work with discredited eruptions
  filter(eruption_category == "Confirmed Eruption") %>%
  # Create start and end dates
  mutate(
    start_date = fix_date(start_year, start_month, start_day),
    end_date = fix_date(end_year, end_month, end_day),
    # To get duration, we have to start with a time interval, 
    # convert to duration, then convert to a numeric value
    duration = interval(start = start_date, end = end_date) %>% 
      as.duration() %>% 
      as.numeric("days"))
## Warning: Problem with `mutate()` input `start_date`.
## ℹ  1 failed to parse.
## ℹ Input `start_date` is `fix_date(start_year, start_month, start_day)`.
## Warning: 1 failed to parse.
## Warning: Problem with `mutate()` input `end_date`.
## ℹ  5895 failed to parse.
## ℹ Input `end_date` is `fix_date(end_year, end_month, end_day)`.
## Warning: 5895 failed to parse.

Let’s start out seeing what month most eruptions occur in…

# Note, I'm using the original month, so 0 = unknown
p <- ggplot(erupt, aes(x = factor(start_month))) + geom_bar()
ggplotly(p)
# I could rename some of the factors to make this pretty, but... nah

Another numerical variable is VEI, volcano explosivity index. A VEI of 0 is non-explosive, a VEI of 4 is about what Mt. St. Helens hit in 1980, and a VEI of 5 is equivalent to the Krakatau explosion in 1883. A VEI of 8 would correspond to a major Yellowstone caldera eruption (which hasn’t happened for 600,000 years). Basically, VEI increase of 1 is an order of magnitude change in the amount of material the eruption released.

# VEI is volcano explosivity index, 
p <- ggplot(erupt, aes(x = vei)) + geom_bar()
ggplotly(p)
## Warning: Removed 2270 rows containing non-finite values (stat_count).

We can also look at the frequency of eruptions over time. We’ll expect some historical bias - we don’t have exact dates for some of these eruptions, and if no one was around to write the eruption down (or the records were destroyed) there’s not going to be a date listed here.

p <- erupt %>%
  filter(!is.na(end_date)) %>%
  filter(start_year > 0) %>%

ggplot(aes(x = start_date, xend = start_date, 
                  y = 0, yend = duration, 
                  color = evidence_method_dating)) + 
  geom_segment() + 
  geom_point(size = .5, aes(text = volcano_name)) + 
  xlab("Eruption Start") + 
  ylab("Eruption Duration (days)") + 
  facet_wrap(~vei, scales = "free_y")
## Warning: Ignoring unknown aesthetics: text
ggplotly(p)
## Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
## Please use `group_by()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

As expected, it’s pretty rare to see many eruptions before ~1800 AD, which is about when we have reliable historical records1 for most of the world (exceptions include e.g. Vestuvius, which we have extensive written information about).

p <- erupt %>%
  filter(!is.na(end_date)) %>%
  # Account for recency bias (sort of)
  filter(start_year > 1800) %>%
ggplot(aes(x = factor(vei), y = duration)) + 
  geom_violin() + 
  xlab("VEI") + 
  ylab("Eruption Duration (days)") + 
  scale_y_sqrt()
ggplotly(p)
## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 3 rows containing non-finite values (stat_ydensity).
## Warning: Groups with fewer than two data points have been dropped.
It seems that the really big eruptions might be less likely to last for a long time, but it is hard to tell because there aren’t that many of them (thankfully).

11.1.2 plot_ly: Like base plotting, but interactive!

You can also create plotly charts that aren’t limited by what you can do in ggplot2, using the plot_ly function.

Plotly cheat sheet

We can start with a scatterplot of volcanoes along the Earth’s surface:

plot_ly(type = "scattergeo", lon = volcano$longitude, lat = volcano$latitude)
## No scattergeo mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

And then we can start customizing:

plot_ly(type = "scattergeo", lon = volcano$longitude, lat = volcano$latitude,
        mode = "markers",
        # Add information to mouseover
        text = ~paste(volcano$volcano_name, "\n", 
                      "Last Erupted: ", volcano$last_eruption_year),
        # Change the markers because why not?
        marker = list(color = "#d00000", opacity = 0.25)
        )

The plot_ly function is also pipe friendly. Variable mappings are preceded with ~ to indicate that the visual appearance changes with the value of the variable.

# Load RColorBrewer for palettes
library(RColorBrewer)

volcano %>% 
  group_by(volcano_type) %>% mutate(n = n()) %>%
  filter(n > 15) %>%
plot_ly(type = "scattergeo", lon = ~longitude, lat = ~latitude,
        mode = "markers",
        # Add information to mouseover
        text = ~paste(volcano_name, "\n", 
                      "Last Erupted: ", last_eruption_year),
        color = ~ volcano_type,
        # Specify a palette
        colors = brewer.pal(length(unique(.$volcano_type)), "Paired"),
        # Change the markers because why not?
        marker = list(opacity = 0.5)
        )
Plotly will handle some variable mappings for you, depending on which “trace” type (plot/geom) you’re using.

The plotly documentation often uses plyr and reshape2 – which are older versions of dplyr and tidyr. If you load plyr and reshape2, it may seriously mess up your day – a lot of the function names are the same. So, instead, here’s a shortcut: cast is pivot_wider and melt is pivot_longer. That should at least help with understanding what the code is doing.

If you do accidentally load plyr or reshape2, that’s fine: just restart your R session so that your loaded packages are cleared and you can start over. Or, if you must, you can reference a plyr function using plyr::function_name without loading the package – that’s a safe way to use the plotly demo code as-is.

Let’s explore traces a bit. According to the plotly documentation,

A trace is just the name we give a collection of data and the specifications of which we want that data plotted. Notice that a trace will also be an object itself, and these will be named according to how you want the data displayed on the plotting surface

In ggplot2 terms, it seems that a trace is somewhat akin to a geom.

trace0 <- rnorm(100, mean = 5)
trace1 <- rnorm(100, mean = 0)
trace2 <- rnorm(100, mean = -5)

data <- tibble(x = 1:100, trace0, trace1, trace2)

# Let's see how this goes with one trace
plot_ly(data, x = ~x) %>%
  add_trace(y = ~trace0, name = 'trace0', mode = 'lines')
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter
# Adding some more traces
plot_ly(data, x = ~x) %>%
  add_trace(y = ~trace0, name = 'trace0', mode = 'lines') %>%
  add_trace(y = ~trace1, name = "trace1", mode = 'lines+markers') %>%
  add_trace(y = ~trace2, name = "trace2", mode = 'markers')
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter

But, if you want all of the variables to be shown with the same trace type, it’s probably easier to get to long form:

data %>%
  pivot_longer(matches("trace"), names_to = "trace", names_prefix = "trace", values_to = "y") %>%
  plot_ly(x = ~x, y = ~y, color = ~trace, mode = "lines+markers")
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter

There are many different trace types in plotly, but your best bet is to check the documentation to see what is available.

11.1.3 Animation

Plotly can also animate your plots for you.

library(classdata)
data(fbi)

fbi %>%
  mutate(State = factor(State),
         Rate_100k = Count/Population*100000) %>%
  filter(Type == "Aggravated.assault") %>%
  arrange(Year, State, Type) %>%
  plot_ly(
    x = ~State,
    y = ~Rate_100k,
    color = ~Type,
    frame = ~Year,
    type = "scatter",
    mode = "markers"
  ) 

Sometimes the animations get a bit trippy, don’t they?

You can even animate by something other than time, if you’re so inclined, though it’s not necessarily going to make sense if there isn’t any context shared between successive observations. So animating over space might make sense, but animating over a factor makes a lot less sense.

fbi %>%
  mutate(State = factor(State),
         Rate_100k = Count/Population*100000) %>%
  arrange(Year, State, Type) %>%
  plot_ly(
    x = ~Year,
    y = ~Rate_100k,
    color = ~Type,
    frame = ~State,
    type = "scatter",
    mode = "lines"
  ) 

There are other types of animations as well, including the ability to change plot formats, trace types, and more.

11.2 Leaflet maps

Leaflet is another javascript library that allows for interactive data visualization. We’re only going to briefly talk about it here, but there is extensive documentation that includes details of how to work with different types of geographical data, chloropleth maps, plugins, and more.

To explore the leaflet package, we’ll start out playing with a dataset of Bigfoot sightings assembled from the Bigfoot Field Researchers Organization’s Google earth tool

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   observed = col_character(),
##   location_details = col_character(),
##   county = col_character(),
##   state = col_character(),
##   season = col_character(),
##   title = col_character(),
##   date = col_date(format = ""),
##   classification = col_character(),
##   geohash = col_character(),
##   precip_type = col_character(),
##   summary = col_character()
## )
## See spec(...) for full column specifications.
if (!"leaflet" %in% installed.packages()) install.packages("leaflet")

library(leaflet)
library(readr)

bigfoot_data <- read_csv("https://query.data.world/s/egnaxxvegdkzzrhfhdh4izb6etmlms")

We can start out by plotting a map with the location of each sighting. I’ve colored the points in a seasonal color scheme, and added the description of each incident as a mouseover label.

bigfoot_data %>%
  filter(classification == "Class A") %>%
  mutate(seasoncolor = str_replace_all(season, c("Fall" = "orange", 
                                                 "Winter" = "skyblue", 
                                                 "Spring" = "green", 
                                                 "Summer" = "yellow")),
         # This code just wraps the description to the width of the R terminal
         # and inserts HTML for a line break into the text at appropriate points
         desc_wrap = purrr::map(observed, ~strwrap(.) %>% 
                                  paste(collapse = "<br/>") %>%
                                  htmltools::HTML())) %>%
leaflet() %>%
  addTiles() %>%
  addCircleMarkers(~longitude, ~latitude, color = ~seasoncolor, label = ~desc_wrap)
## Warning in validateCoords(lng, lat, funcName): Data contains 459 rows with
## either missing or invalid lat/lon values and will be ignored

Of course, because this is an interactive map library, we aren’t limited to any one scale. We can also plot data at the city level:

if(!"nycsquirrels18" %in% installed.packages()) {
  devtools::install_github("mine-cetinkaya-rundel/nycsquirrels18")
}
  
library(nycsquirrels18)

data(squirrels)

head(squirrels)
## # A tibble: 6 x 35
##    long   lat unique_squirrel… hectare shift date       hectare_squirre… age  
##   <dbl> <dbl> <chr>            <chr>   <chr> <date>                <dbl> <chr>
## 1 -74.0  40.8 13A-PM-1014-04   13A     PM    2018-10-14                4 <NA> 
## 2 -74.0  40.8 15F-PM-1010-06   15F     PM    2018-10-10                6 Adult
## 3 -74.0  40.8 19C-PM-1018-02   19C     PM    2018-10-18                2 Adult
## 4 -74.0  40.8 21B-AM-1019-04   21B     AM    2018-10-19                4 <NA> 
## 5 -74.0  40.8 23A-AM-1018-02   23A     AM    2018-10-18                2 Juve…
## 6 -74.0  40.8 38H-PM-1012-01   38H     PM    2018-10-12                1 Adult
## # … with 27 more variables: primary_fur_color <chr>, highlight_fur_color <chr>,
## #   combination_of_primary_and_highlight_color <chr>, color_notes <chr>,
## #   location <chr>, above_ground_sighter_measurement <chr>,
## #   specific_location <chr>, running <lgl>, chasing <lgl>, climbing <lgl>,
## #   eating <lgl>, foraging <lgl>, other_activities <chr>, kuks <lgl>,
## #   quaas <lgl>, moans <lgl>, tail_flags <lgl>, tail_twitches <lgl>,
## #   approaches <lgl>, indifferent <lgl>, runs_from <lgl>,
## #   other_interactions <chr>, zip_codes <dbl>, community_districts <dbl>,
## #   borough_boundaries <dbl>, city_council_districts <dbl>,
## #   police_precincts <dbl>
squirrels %>%
  mutate(color = tolower(primary_fur_color)) %>%
  leaflet() %>%
  addTiles() %>%
  addCircleMarkers(~long, ~lat, color = ~color)

We can also plot regions, instead of just points. I downloaded a dataset released by the US Forest Service, Bailey’s Ecoregions and Subregions dataset, which categorizes the US into different climate and ecological zones.

To map colors to variables, we have to define a color palette and variable mapping ourselves, and pass that function into the leaflet object we’re adding.

library(rgdal)
## Loading required package: sp
## rgdal: version: 1.5-16, (SVN revision 1050)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
## Path to GDAL shared files: /usr/share/gdal/2.2
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
## Path to PROJ shared files: (autodetected)
## Linking to sp version:1.4-2
ecoregions <- readOGR("data/Bailey_s_Ecoregions_and_Subregions_Dataset.geojson")
## OGR data source with driver: GeoJSON 
## Source: "/home/susan/Projects/Class/unl-stat850/stat850-textbook/data/Bailey_s_Ecoregions_and_Subregions_Dataset.geojson", layer: "Bailey_s_Ecoregions_and_Subregions_Dataset"
## with 3072 features
## It has 12 fields
# Define a palette
region_pal <- colorFactor(c("#E67E22", "#0B5345", "#229954", "#B3B6B7"), ecoregions$DOMAIN)

ecoregions %>%
leaflet() %>%
  addTiles() %>%
  addPolygons(stroke = TRUE, fillOpacity = 0.25, 
              fillColor = ~region_pal(DOMAIN), color = ~region_pal(DOMAIN), label = ~SECTION)

Try it out

Download the Shapefiles for the 116th Congress Congressional Districts. Unzip the file and read it in using the code below (you’ll have to change the file path).

Use the MIT Election Data and Science Lab’s US House election results2, and merge this data with the shapefiles to plot the results of the 2018 midterms in a way that you think is useful (you can use any of the available data).

Some notes: - FIPS codes are used to identify the state and district, with 00 indicating at-large districts (one district for the state) and 98 indicating non-voting districts. - If you would like to add in the number of citizens of voting age, you can get that information here but you will have to do some cleaning in order to join the table with the others. - Minnesota’s Democratic-farmer-labor party caucuses with the Democrats but maintains its name for historical reasons. You can safely recode this if you want to.

library(sf)
## Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3
# Read in the districts
congress_districts <- st_read("data/116_congress/cb_2018_us_cd116_5m.shp")
## Reading layer `cb_2018_us_cd116_5m' from data source `/home/susan/Projects/Class/unl-stat850/stat850-textbook/data/116_congress/cb_2018_us_cd116_5m.shp' using driver `ESRI Shapefile'
## Simple feature collection with 441 features and 8 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -179.1473 ymin: -14.55255 xmax: 179.7785 ymax: 71.35256
## CRS:            4269
# Read in the results
election_results <- read_csv("data/1976-2018-house2.csv") %>%
  filter(year == 2018) %>%
  mutate(state_fips = sprintf("%02d", state_fips), 
         district = sprintf("%02d", district))
## Parsed with column specification:
## cols(
##   year = col_double(),
##   state = col_character(),
##   state_po = col_character(),
##   state_fips = col_double(),
##   state_cen = col_double(),
##   state_ic = col_double(),
##   office = col_character(),
##   district = col_double(),
##   stage = col_character(),
##   runoff = col_logical(),
##   special = col_logical(),
##   candidate = col_character(),
##   party = col_character(),
##   writein = col_logical(),
##   mode = col_character(),
##   candidatevotes = col_double(),
##   totalvotes = col_double(),
##   unofficial = col_logical(),
##   version = col_double()
## )
# Clean up congress districts
congress_districts <- congress_districts %>%
  # Convert factors to characters
  mutate(across(where(is.factor), as.character)) %>%
  # Handle at-large districts
  mutate(district = ifelse(CD116FP == "00", "01", CD116FP))

One solution

library(sf)
library(htmltools) # to mark labels as html code

# Read in the results
election_results <- read_csv("data/1976-2018-house2.csv") %>%
  filter(year == 2018) %>%
  mutate(state_fips = sprintf("%02d", state_fips), 
         district = sprintf("%02d", district)) %>%
  group_by(state, state_fips, state_po, district, stage) %>%
  arrange(candidatevotes) %>%
  mutate(pct = candidatevotes/totalvotes) %>%
  # Keep the winner only
  filter(pct == max(pct)) %>%
  # Fix Minnesota
  mutate(party = ifelse(party == "democratic-farmer-labor", "democrat", party))
## Parsed with column specification:
## cols(
##   year = col_double(),
##   state = col_character(),
##   state_po = col_character(),
##   state_fips = col_double(),
##   state_cen = col_double(),
##   state_ic = col_double(),
##   office = col_character(),
##   district = col_double(),
##   stage = col_character(),
##   runoff = col_logical(),
##   special = col_logical(),
##   candidate = col_character(),
##   party = col_character(),
##   writein = col_logical(),
##   mode = col_character(),
##   candidatevotes = col_double(),
##   totalvotes = col_double(),
##   unofficial = col_logical(),
##   version = col_double()
## )
# Read in the districts
congress_districts <- st_read("data/116_congress/cb_2018_us_cd116_5m.shp") %>%
  mutate(geometry = st_transform(geometry, crs = st_crs("+proj=longlat +datum=WGS84")))
## Reading layer `cb_2018_us_cd116_5m' from data source `/home/susan/Projects/Class/unl-stat850/stat850-textbook/data/116_congress/cb_2018_us_cd116_5m.shp' using driver `ESRI Shapefile'
## Simple feature collection with 441 features and 8 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -179.1473 ymin: -14.55255 xmax: 179.7785 ymax: 71.35256
## CRS:            4269
# Clean up congress districts
congress_districts <- congress_districts %>%
  # Convert factors to characters
  mutate(across(where(is.factor), as.character)) %>%
  # Handle at-large districts
  mutate(district = ifelse(CD116FP == "00", "01", CD116FP))

# Merge
congress_districts <- congress_districts %>% 
  left_join(election_results, by = c("STATEFP" = "state_fips", "CD116FP" = "district")) %>%
  mutate(party = factor(party, levels = c("republican", "democrat")),
         short_party = ifelse(party == "republican", "R", "D"),
         label = paste0(state_po, "-", district, candidate, " (", short_party, ")"))

# Define a palette
region_pal <- colorFactor(c("#e9141d", "#0015bc"), congress_districts$party)

congress_districts %>%
leaflet() %>%
  addTiles() %>%
  addPolygons(stroke = TRUE, fillOpacity = ~pct/2, 
              # still want to see what's underneath, even in safe districts
              fillColor = ~region_pal(party), color = ~region_pal(party), 
              label = ~label)

11.3 Shiny

https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-05-26/readme.md

Before we get started on Shiny, take a few minutes and poke around the RStudio Shiny user showcase. It helps to have some motivation, and to get a sense of what is possible before you start learning something. One of the more amusing ones I found was an exploration of lego demographics.

Shiny is a framework for building interactive web applications in R. Unlike plotly and other graphics engines, Shiny depends on an R instance on a server to do computations. This means Shiny is much more powerful and has more capabilities, but also that it’s harder to share and deploy - you have to have access to a web server with R installed on it. If you happen to have a server like that, though, Shiny is pretty awesome. RStudio runs a service called shinyapps.io that will provide some limited free hosting, as well as paid plans for apps that have more web traffic, but you can also create Shiny apps for local use - I often do this for model debugging when I’m using neural networks, because they’re so complicated.

RStudio has a set of well produced video tutorials to introduce Shiny. I’d recommend you at least listen to the introduction if you’re a visual/audio learner (the whole tutorial is about 2 hours long). There is also a written tutorial if you prefer to learn in written form (7 lessons, each is about 20 minutes long).

I generally think it’s better to send you to the source when there are well-produced resources, rather than trying to rehash something to put my own spin on it.

One other interesting feature to keep in mind when using Shiny - you can integrate Shiny reactivity into Rmarkdown by adding runtime: shiny to the markdown header.

11.4 General References

11.4.1 Other interactive tools

  • htmlwidgets - a generic wrapper for any Javascript library (htmlwidgets is used under the hood in both Leaflet and Plotly R integration)

  • dash - Another dashboard program supported by plotly. dash is the python equivalent of shiny, but also has R integration (though I’m not sure how well it’s supported).

11.4.2 Debugging


  1. There are obviously exceptions - we can figure out the exact date and approximate time that there was an earthquake along the Cascadia subduction zone based on a combination of oral histories of the indigenous people and records of a massive tsunami in Japan Excellent read, if you’re interested, and the Nature paper.↩︎

  2. Alternately, you can find the rehosted file here.↩︎